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1, INTRODUCTION 

Assessing handwriting in Latin alphabets involves analyzing the composition of correct geometric 
character strokes. Each alphabet stroke patterns formation has a specific direction and sequence, length and 
curvature; relative to the subsequent strokes in an alphabet. In order to facilitate teaching and learning, word 
processors fonts are used to create the relevant materials for this matter. Fonts like Comic Sans Ms, 
Syazalina83v3, Century Gothic, Tw Cen MT, Tw Cen MT Condensed, Tw Cen MT Condensed Extra Bold 
and Primetime are recommended by a group of Malaysian teacher in facilitating teaching and learning [1]. 
Syazalina83v3 font was created by a primary school teacher and designed based on printed Malaysian text 
book [2]. These font stroke formations are frequently used to teach level one primary school children aged 
six years old. Many attempts have been made to automatically identify correct letter formation to speed up 
the assessment process. 

Various methods and approaches have been used to recognize the correctness of the handwritten 
formation through stroke pattern decomposition such as pattern recognition, Artificial neural network, 
evolutionary algorithms and morphology algorithms. Pattern recognition based methods are popular in 
solving problems related to identification of text, numbers or image. In text studies areas, it is widely used in 
identification of shapes and strokes in Latin alphabets [3-4], Chinese strokes [5], Gurumukhi [6-8] and 
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Arabic characters [9-10]. Chain code were applied by [11-13]on handwritten Arabic alphabet to detect order, 
total, likeness and direction of strokes. Chain codes were also used to compares stroke formation of 
handwritten Latin alphabets to conventional rules of alphabet formation taught in school [3]. However, chain 
code are found to be susceptible to noise and demand high memory and processing power [14]. 

Artificial Neural Network (ANN) based method is another handwriting assessment method used by 
researchers. ANN is a computational model that replicate the capability of human brains neural networks to 
learn and relearn as its neural network changes based on input fed and output produced. It was applied in 
analyzing personality of a person based on offline handwritten small letter ‘t’ [15-16] and to assess children 
handwriting based on the typology of stroke type, sequences, and direction of Latin alphabet 
formation [17-18]. The strokes are classified into three types of stroke patterns which are straight line, 
complex straight line and curve. Each stroke would have its own range of BPNN neuron value. The tested 
handwriting values will then be compared to BPNN neuron value of reference alphabet [17]. In another 
research, BPNN values and correlation analysis methods were used to analyze the accuracy of six complex 
straight line Latin alphabet formations. BPNN are known for its accuracy and versatility, the accuracy of 
BPNN depends on numbers of train data fed. The higher the volume of training data is supplies to BPNN, the 
more accurate the result will be. However, due to these facts, it was discovered that BPNN is time-consuming 
due to the needs to train lots of data and complexity of processing [18]. Evolutionary algorithm (EA) based 
techniques which were inspired from the biological evolutions are also used in detecting alphabet strokes. 
Genetic Algorithm (GA) is one of the widely applied EA techniques are based on the evolutionary ideas of 
natural selection and genetics. It was used to detect and extract handwriting strokes and features [19] using 
the concept of fitness function. GA’s pattern recognition result is highly dependent on fitness function 
design. Poor design of fitness function will result in inefficient or incomprehensible recognition product. 

In general morphology means the study of a particular form, shape, or structure. Convex and 
concave hulls are useful morphology concepts used for a wide variety of application areas, such as pattern 
recognition, image processing, statistics, and classification tasks. However, it was discovered that convex 
hull could not comprehensively identify the geometrical features of a shape [5]. In certain application it does 
not fully reflect the geometrical characteristics of a dataset since it doesn’t follow the path of the outermost 
points. To overcome the drawback of convex hull algorithm, concave hull algorithm was introduced. The 
concave hull approach is a more advanced approach used to capture the exact shape of the surface of a 
dataset; nevertheless, formulating the set of concave hull is difficult [20]. Boundaries extraction, Hit-or-Miss 
Transform (HMT) and region filling are other examples of widely used application of morphology 
algorithms [21-22]. 

The HMT is a fundamental operation on binary images which has been widely used for 40 
years [23]. HMT is a well-known morphological transform that provides an extremely powerful set of tools 
for image processing. The input of HMT are binary images and a specifically designed template called 
structuring element (SE). Structuring elements (SE) is a pre-defined template used to identify groups of 
connected pixels that comply with certain geometric properties of the analyzed binary images based on its 
foreground and background. The accuracy of this algorithm is greatly dependent on its shape and size of the 
SE. [24] Thus this study is conducted with the purpose of seeking for the appropriate general SE 
decomposition of HMT that can accurately extract and recognized Latin alphabet images. 


2. RELATED WORKS 

Chea et al [3] expressed that Latin alphabets are combination of stroke patterns categorized as 
simple straight lines, complex straight lines and curve lines. Latin alphabets formation are made up of 
simplest of elements which are one or more straight lines comprising of vertical, horizontal or diagonal lines 
to a more complex curve lines comprising of a whole circle or semi-circle. Fifteen Latin alphabet comprises 
of are single directional straight lines consisting of a combination of horizontal ( | ), vertical ( _ ) or 
diagonal( / and \ ) . These can further be divided into two groups that is simple straight lines (A, E, F, H, I, K, 
M, N, T, X, Y) and complex straight lines which combines two or more complex lines within one single 
stroke (L, V ,W,or Z). Three letters are made up entirely of curved lines which are C, O, S. Letters such as B, 
D, J, P, R, U are constructed from straight lines and curves, or semi-circles (bowls) connected in various way. 
Finally, two letters G and Q are essentially circular, but consist of short bar or spur (straight or curled) to 
differentiate them from similar curved letters which is C and O respectively. 

HMT is capable of identifying certain geometric properties based on relative ordering of pixel 
values known as structuring elements. The structuring elements are represented as a small matrix of pixels, 
each with a value of | or 0. The dimensions of the matrix determine the overall size of the structuring 
element, and its shape is determined by the pattern of ones and zeros. Usually HMT uses the fixed SE pair in 
the global image, and only extracts the object of the same size and shape on the foreground image. Some 
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researchers matched SE as ‘fits’ while others as ‘hits’ [25]. HMT has been used to recognize handwritten 
Bengali numerals [26-27] and the study’s results in an accuracy of more than 96% recognition for most of the 
numerals. The program shown an accelerated average time taken by the program to identify each numeral 
even for a very low-spec computer. Eugene and Edward [28] developed a class of structuring-element pairs 
for segmentation-free character recognition via the morphological HMT for recognizing Courier font. Both 
hit and miss structuring elements are selected so that the hit-or-miss transform can be applied across the test 
image without prior segmentation. Although they uses basic HMT method, it was proven to achieve high 
rates of accuracy on text and very robust with respect to the threshold level for the input gray-scale data. 

No literature on finding the appropriate general SE decomposition was found however there are 
several literatures describe SE on various usage for image recognition. Doh et al [29] studies the choice of 
SEs for the recognition of a class of various objects. They start from two sets: a set of hit SEs that fit the 
objects to be recognized and a set of miss SEs that fit the background. The research resulting on using 
synthetic hit SE composed of the intersection of all hit SEs and a synthetic miss SE composed of the union of 
all miss SEs for better recognition of diverse objects . Zhao and Daut [30] present a technique which uses 
upper and lower bounds to determine the SEs for use in the HMT using a priori knowledge edge of the 
shapes to be detected. This technique uses the skeletons of both the object to be recognized and its 
complement as SEs. 


3. METHODOLOGY 

The methodological approach taken in this study consist of four phase as shown in Figure |. The 
initial phase is the alphabet selection based on complexity of stroke formation. These involves selecting eight 
Latin alphabets divided into four based on their complexity. These groups are Simple Straight Line (SSL), 
Curve Line (CL), Complex Straight Line (CSL) and Combination of Complex Straight Line and Simple 
Straight Line (CCSL). Phase two is image pre-processing. These comprises of two processes which are 
binarization and thinning. Phase three is shape recognition consisting of two process which are manual 
measurement and hit or miss algorithm. Final phase is performance evaluation. The performance of various 
SE template size that most appropriately describe the chosen Latin alphabet stroke formation are evaluated. 


Alphabet Selection 


Preprocessing 


Manual Mea surem ent 


Design Structunng Elem ent 


Counting Hit Stroke 





Select Template 





Figure 1. Methodology for Selecting Appropriate General SE Decomposition 


3.1 Alphabet Selection Based on Complexity of Stroke Formation 

Latin alphabet comprises of twenty six letters and composed of two main strokes formation, which 
are straight line and curve lines. In this study, uppercase letters are grouped into four categories of stroke 
patterns which are Simple Straight Line (SSL), Curve Lines (CL), Complex Straight Line (CSL) [3] and 
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combination of the three strokes (CLSSL). Two alphabets are selected from each category in this study. The 
selected alphabets are depicted in Table | according to their complexity. 


Table 1. Stroke Pattern and Selected Alphabets 
Stroke Pattern Alphabets(Syazalina3v3) 
Simple Straight Line(SSL) 


Curve Lines(CL) 
Complex Straight Line(CSL) 


Combination of SSL ,CL and CSL (CLSSL) 


Awe rHOmMS 


Alphabet A and E are selected to represent SSL, C and J represent CL, L and V represent CSL and 
B and G represent CLSSL. 


3.2 Pre-processing 

This section discussed on the image pre-processing task as in Figure 2. In this process, the eight 
alphabets are cropped and saved in jpeg format. The current jpeg format shows that the images are in 
grayscale format. In order to increase identification accuracy of the alphabet strokes, the images are 
converted to binary format and thinning operation respectively. 


Grey Scale Binarization Thinning Alphabets 
Alphabet (Otsu’s Process ‘Skeleton 


Image Method) Image 





Figure 2. Pre-processing 


The binarization process converts the grey level image to black and white images to minimize the 
intra-class variance. Otsu’s thresholding method was selected. Next is the morphological thinning process to 
reduce the alphabet image to a single pixel thickness. This reduces the processing time as well as remove the 
possibility of detecting false trivial details [31] Figure 3 shows the result of binarization and thinning. 





Figure 3. Binarization and Thinning of the Test Image 


3.3. Shape Recognition 
This phase consists of a two main process which is manual measurement and utilization of hit or 
miss algorithm. 


3.3.1. Manual Measurement 

The manual measurement was done by using a ruler. The binarize image of the eight alphabets are 
printed and every stroke are measured in centimeters. The strokes are measured as vertical, horizontal, left 
and right diagonal. These stroke measurements are recorded in a form of table. Table 2 shows the manual 
measurement value: 


Table 2. Manual Measurement (cm) 
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Line Category Manual Measurement (cm) 


Alphabet Stroke | Strokes - Strokes / Strokes \ Total 

Simple Straight Line (SSL) A 0 4 7.8 7.8 19.6 

E fe. 11.6 0 0 18.8 

Curve Line (CL) C 25 4.9 3.4 3.9 14.7 

J 6 4.4 1.5 2 13.9 

Complex Straight Line L 6.8 3.4 0 0 10.2 

Vv 0 0 7.6 7.6 15.2 

Combination of SSL and CL B 6.5 7.9 3.6 3.6 21.6 
G 6.5 12 4.5 3.7 a2 


3.3.2. Hit or Miss Transform Algorithm 

The effectiveness of HMT detection heavily depend on design of SE. The design of SE must comply 
with the structure of the object that are to be detected. 
1) Structuring Element Design 

There are two main characteristics that are directly related to SE that is shape and size. Shape is 
crucial for recognizing object while size is imperative to set the observation scale and criteria to differentiate 
image object as well as features. 

As stated in the literature [3], most Latin alphabets strokes consists of a combination of horizontal 
(|), vertical ( _ ) or diagonal lines( / and \ ). Thus the SE are design accordingly in a 2 x 2,3 x 3 and5 x 5 
matrix size that represents horizontal, vertical, left diagonal and right diagonal strokes. These SE are apply 
with the HMT algorithm where the hit count is correlated with the count from the manual measurement. 
Based on [3], the SE pattern are design to be horizontal, vertical, left diagonal and right diagonal. The 
designs are shown in Table 3. 


Table 3. Structuring Element Shape and Size 


Structuring Elements Size Structuring Element Shape 
5 x5 Vertical 5 x5 Horizontal 5 x5 Left Diagonal 5 x5 Right Diagonal 
3 x 3 Vertical 3 x3 Horizontal 3 x 3 Left Diagonal 3 x3 Right Diagonal 
2x2 Vertical 2x2? Horizontal 2 x 2 Left Diagonal 2x2 Right Diagonal 


2) Counting stroke using Hits Algorithm 

The hit process will match it to the intended pixels (1s), which represented the stroke image, and 
remove unwanted pixels (Os) of the structure that it want to miss. Hit or miss algorithm are executed using 
the SEs in table 2. Only the hits where the SE fully matched the object structure are counted. 

Hit or miss algorithms are formulated using the followings formula: 


A* B=(A*X)1 [Ac* (W— X)] 


B denotes the set composed of X and its background, the match/hit (or set of matches/hits) of B in A, 

X is set formed from elements of B associated with an object while (W — X) are set formed from 
elements of B associated with the corresponding background. 

Initially only the Se of the same window size are executed on all the selected alphabets. Later, a 
combination of different size windows are tested. To avoid the differentiation for the different sizes the 
results are calculated based on percentage of the total counts. 

3.4 Correlation between HMT and Manual Measurement 
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The collections of results are correlated against the manual measurements. Pearson correlation is 
used to assess the linearity of the results. The assessment is done by analyzing the strength of the correlation 
coefficient. The stronger the correlation the better the object description. This correlation is based on Evans 
correlation guide [32] as shown in Table 4. 


Table 4. Evans Correlation Guide 


Correlation Value Description 
0.00-0.19 Very Weak 
0.20-0.39 Weak 
0.40-0.59 Moderate 
0.60-0.79 Strong 
0.80-1.00 Very Strong 


4. RESULTS AND ANALYSIS 

Two SEs are found to be the most fitting template: 5x5 with3x3 and 5x5 with 2x2. They produced 
the same score for coefficient correlation of all letters except for the correlation of letter G, which is slightly 
different as 5x5 with 3x3 value is r= 0.59 and 5x5 with 2x2 value of r =0.60. 


5. CONCLUSION 

This study discusses on applying morphology hit and miss algorithm in extracting an alphabet 
image. The proposed methods depicts that it is successful in determining the most appropriate SE for feature 
extraction purposes. Based on the experiment, the combination of small size SE 

And large size SE are most appropriate for detecting the strokes composition of alphabets image. 
Based on the conducted empirical experiments, 2x2 SE are best for extraction for diagonal strokes line and 
5x5SE for best for extracting both vertical strokes and horizontal strokes. The combined SE will be used in 
future studies for detecting correct handwriting strokes and legibility of handwritings in Latin alphabet 
among children in lower primary school. 
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